Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proc: initial (local) process manager implementation #7048

Merged
merged 2 commits into from
Nov 29, 2021

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Nov 25, 2021

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Related to #6267

Local prerequisite for #7002

  • Adds dvc.proc module for managing background processes
  • Process state information is kept within the manager directory
  • Processes are backgrounded and output is redirected to a file (stdin interactivity is not currently supported)
    • Backgrounding is handled through multiprocessing.Process(daemon=True)

Functionality is not used anywhere else in DVC yet (exp tempdir/queue runs will be the first place it gets added)

Python shell demo:

asciicast

@pmrowla pmrowla added skip-changelog Skips changelog A: executors Related to the executors feature labels Nov 25, 2021
@pmrowla pmrowla requested a review from karajan1001 November 25, 2021 09:00
@pmrowla pmrowla requested a review from a team as a code owner November 25, 2021 09:00
@pmrowla pmrowla self-assigned this Nov 25, 2021
@pmrowla pmrowla requested a review from efiop November 25, 2021 09:01
Comment on lines +66 to +92
def send_signal(self, name: str, signal: int):
"""Send `signal` to the specified named process."""
raise NotImplementedError

def kill(self, name: str):
"""Kill the specified named process."""
raise NotImplementedError

def terminate(self, name: str):
"""Terminate the specified named process."""
raise NotImplementedError

def remove(self, name: str, force: bool = False):
"""Remove the specified named process from this manager.

If the specified process is still running, it will be forcefully killed
if `force` is True`, otherwise an exception will be raised.

Raises:
ProcessNotTerminatedError if the specified process is still
running and was not forcefully killed.
"""
raise NotImplementedError

def cleanup(self):
"""Remove stale (terminated) processes from this manager."""
raise NotImplementedError
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karajan1001 the next set of tasks will be to fill out these methods.

  • send_signal, terminate, kill are all related and would be grouped together in the first task. Implementation/behavior for these methods should work the same as they do in subprocess and multiprocessing (https://docs.python.org/3/library/subprocess.html#subprocess.Popen.send_signal).
  • remove and cleanup can be implemented once terminate/kill are done (since force-removing a child process directory will require killing the process if it is still running)

@pmrowla pmrowla changed the title proc: initial (local) process manager implementation [WIP] proc: initial (local) process manager implementation Nov 25, 2021
proc.start()
# Do not terminate the child daemon when the main process exits
# pylint: disable=protected-access
mp.process._children.discard(proc) # type: ignore[attr-defined]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of a hack, but even if we subclassed multiprocessing.Process we would just end up with an overridden

def start(self):
    super().start()
    _children.discard(self)

which I'm not sure is any cleaner

But in general, the idea behind using multiprocessing is that it allows us to use spawn instead of fork on windows + mac (#4294), and also allows us to directly run the target python function we want instead of adding the overhead from using dvc.daemon + a new dedicated CLI command

@pmrowla pmrowla changed the title [WIP] proc: initial (local) process manager implementation proc: initial (local) process manager implementation Nov 25, 2021
Comment on lines +26 to +41
def __iter__(self):
return self.processes()

def __getitem__(self, key: str) -> "ProcessInfo":
info_path = os.path.join(self.wdir, key, f"{key}.json")
try:
with open(info_path, encoding="utf-8") as fobj:
return ProcessInfo.from_dict(json.load(fobj))
except FileNotFoundError:
raise KeyError

def get(self, key: str, default=None):
try:
return self[key]
except KeyError:
return default
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also in the future we probably want this to just mirror the standard dict/Mapping interface (and in processes()) so that we can return the paired name + info as needed instead of just the info from this initial implementation

dvc/proc/process.py Outdated Show resolved Hide resolved
- supports running (backgrounded) processes with redirected output
@efiop efiop merged commit dd535f6 into iterative:master Nov 29, 2021
@pmrowla pmrowla deleted the process-manager branch November 29, 2021 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: executors Related to the executors feature skip-changelog Skips changelog
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants